HITTING AND RETURNING INTO RARE EVENTS FOR ALL 
ALPHA-MIXING PROCESSES 



MIGUEL ABADI AND BENOIT SAUSSOL 

Abstract. We prove that for any Q-mixing stationnary process the hitting 
time of any n-string An converges, when suitably normaUzed, to an exponential 
law. We identify the normalization constant X{An)- A similar statement holds 
also for the return time. 

To establish this result we prove two other results of independent interest. 
First, we show a relation between the rescaled hitting time and the rescaled 
return time, generalizing a theorem by Haydn, Lacroix and Vaienti. Second, 
we show that for positive entropy systems, the probability of observing any 
n-string in n consecutive observations, goes to zero as n goes to infinity. 



1. Introduction 

The study of the statistical properties of the time elapsed until the occurrence 
of an observable of positive measure in a stationnary stochastic process and/or in 
a measure preserving dynamical system is a classical subject. The starting point of 
this study is the famous Poincare Recurrence Theorem who states that in an ergodic 
system, any set of positive measure appears in the process infinitely many times. 
This is a qualitative result in the sense that no statistical properties of these returns 
are established. In the last twenty years many notions of return were introduced and 
studied. These notions depend on the initial conditions, the observed set, and on 
the measure of the system. There was an intensive interest to study their statistical 
properties to model physical phenomena like intermittence and metastability. Then, 
the applications were extended to other areas such biology, linguistic and computer 
science to describe phenomena like gene occurrence in a DNA and protein sequences, 
rhythm of a language and data compression algorithms, to mention some of them. 

In the present paper we consider a fixed set A of positive measure fJ.{A) in an 
ergodic system. When the evolution starts outside A, the time elapsed until the 
first occurrence of the set, is referred as the hitting time of A. When the evolution 
starts inside A, the time is referred as the return time to A. 

Our main result is that under the so called a or strongly mixing condition, 
the distribution of the hitting time of a set A can be well approximated by an 
exponential law. The approximation is in the supremum norm in the space of 
distribution functions. Although the exponential law is a classical subject our 
result is new and interesting: 

a) Our results holds for any cylinder set, namely, around any point, including 
periodic points and not just around generic points. 

b) The result holds for any a-mixing systems, while the best previous works [1] 
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assumed a polynomial rate of at least (1 + ^/b)l2. Moreover, this strong-mixing 
condition is the weakest among many types of mixing conditions, among them 
ip,(l),p,/3 or absolutely regular, / or information regularly. See Bradley [2]. 
c) We also show that the exponential law holds when considering not just a cylinder 
set but even a set which is a union of cylinders. Moreover, the cardinal of this union 
can be exponentially large, with respect to the length of the cylinders. 

Following the Galves and Schmitt 6 approach we get that the parameter of the 
exponential law is the product X{A)fi{A), where X{A) is a positive number related 
to the short recurrence properties of the set A. For a description of these properties 
see Abadi p]. In the aforementioned paper, the authors show that for ?/'- mixing 
systems, there exist two positive constants K,K' such that K < X{A) < K' . In 
our case, the constant K does not exist, and one can have X{A) arbitrarily small. 

We prove our result by showing other two results which are interesting by them- 
selves. In the first one, we establish an ergodic relationship between the re-scaled 
hitting time X{A)p,{A)ta and the equally re-scaled return time. The idea of this 
result comes from a paper of Haydn, Lacroix and Vaienti [8], which established 
such a relationship for the rescaled pl{A)ta hitting time and return time. This in 
general does not apply in our case since one can have X{A) ^ 1, for instance, around 
periodic points. The proof follows even a different approach. 

The second result we mentioned above read as follows. The probability of ob- 
serving an 71-cylinder, or even a union of them, in n consecutive observations, goes 
to zero with n for a-mixing systems. Moreover, we show that the convergence is 
uniform on A. It only depends on the cardinality of the union, but not in the 
choice of the cylinders. This is natural when the measure of the set decays e.g. 
exponentially with n. But is far from obvious and maybe even anti-intuitive, when 
the measure decays just polynomially fast with power less than one, as it is covered 
by our case. 

2. Statement of the results 

Let ^ be a finite or countable set and let E — be the set of sequences. We 
endow E with the shift map T . Given non negative integers m < n and a point 
a; G E we denote by [a;,„ . . . Xn] the cylinder of rank (m, n) containing x, that is 

[xjji . . . Xji\ . — {y^ E. yjji — Xjyi , . . . , — } . 

A cylinder of rank (0, n — 1) will be simply called of rank n. We denote by the 
collection of cylinders of rank (m, n) and by the u-algebra generated by the 
partition . Let be the cr-algebra generated by the J^," 's and /i be a T- invariant 
probability measure on (E, J^). Let 

= sup sup \fj.{An B) ~ fi{A)fi{B)\ 

for any integer g. We assume that the system {Y,^T,^) is a-mixing, in the sense 
that a{g) as g — > oo. This is the weakest notion of mixing among 4> a-nd 
■0-mixing. We emphasize that we do not assume any summability condition on the 
sequence a{g). 

Let A e E be a measurable set. We define the hitting time to A by 
ta{x) = inf{fc > 1 : T'^x e A}, x G E. 
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We are interested in the distribution of the hitting time ta on the probabihty space 
(S, /i), and the return time, defined with the same formula but on the probabihty 
space {A, ii{-\A)) where denotes the conditional measure on A. 

Theorem 1. Suppose that the system (I],T, /i) is a-mixing. Then for any sequence 
An G -^0 ^ such that 

(1) t^i^A,^ < rt) — > as n ^ oo, 

there exists some normalizing constant \{An) > such that the following holds: 

• the hitting time to An, reseated by A(A„)/i(A„), converges in distribution 
to an exponential distribution. Namely, 

sup |/^(A(A„)/i(y4.„)TA„ > t) — exp(— 1)| as n ^ oo. 

The convergence is uniform on families of sets An where the convergence 
in ([T]) is uniform. 

• the distribution of the return time is approximated by a convex combination 
of a Dirac mass at zero and an exponential distribution. More precisely, 

sup \ X{An)^^ fJ-{X{An) ^{An)TA„ > t\An) — exp( — 1)| — > flS 71 — > OO, 
t>s 

for any s > 0. 

• we have limsupA(A„) < 1. 

The normalizing constant A(j4„) may not converge in general, thus we cannot 
simply say that the limiting distribution of the rescaled return time exists. More- 
over, even if it converges the limit may not be equal to one. For example a case 
of interest is when limA(A„) = where we still get a non-trivial exponential ap- 
proximation, while without the extra factor A(A„) one would just obtain the rough 
statement that the rescaled hitting time /i(A„)T^^ — >■ -t-oo and the rescaled return 
time /i(A„)T^^ — 7> in distribution. 

In the next section we show that the hypothesis in the theorem holds for a broad 
class of sequences of sets An- 

3. Rare events do not appear too soon 

We present some explicit examples of sequences A„ under which Theorem [1] 
applies, that is when the condition ([T]) of the theorem is satisfied. They are conse- 
quences of Proposition ini presented below. 

The first example was the motivation of our work: 

Example 2. For any a £ the sequence of cylinders An — [oq, . . . , a„_i] satis- 
fies the hypothesis of Theorem\^ Moreover, the convergence is uniform on a. 

We emphasize that this approximation with an exponential distribution is valid 
for any point a G S, including for example periodic points. This generalizes the 
result in [7] which concern a.e. sequence a. 

Returns to the cylinder [ap, . . . , a„_i] in the example above means that there is 
a perfect matching of the first n symbols. It turns out that for some applications 
the approximate matching is more interesting: 
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Example 3. Approximate matching: Let a E and D G (0, 1). Denote for 5 G S 
by dn{a, b) = card{i <n—\: ai ^ bi\ the Ramming distance of the first n symbols. 
Let 

A„ = {6 e S : d„(a, b) < Dn}, 

be the D% approximate matching of [ao, . . . , a,i-i]. Then there exists Da > such 
that for all D G (0,-Do), the sequence An satisfies the hypothesis of Theorem]^ 

In DNA sequence analysis the alphabet A is {A, C, G, T}. For some sequences 
the entropy is lower estimated by 1.1 bits per symbol (for example the human gene 
HuMRETBLAS; scc which means — 1.7 In 2 . This gives a value of Dq w 41%. 

Proof. We count the number k„ of cylinders of rank n which compose the D% 
approximate matching A„. We have 



k=0 



k=0 ^ ^ 



1 + i:>(cardyt- 1) 
D^ 

We choose Dq > as the smallest solution of (1 + D{ca.TdA - l))/D^ = e''^^'^) 
and then Proposition [5] applies for any D < Dq. □ 

Example 4. For a set K (Z Yi define its topological entropy by 

^top(-f^) = hmsup — log:/^{C: C cylinder of rank n s.t. Cnif 7^0}. 

n— ^oo n 

Denote by Tq^^{K) the union of those cylinders C of rank n such that !■{ (IC 7^ 0. 
The sequence An = J^q^^{K), under the assumption that htopiK) < h^, satisfies 
the hypothesis of Theorem]^ 

Example 5. Suppose that An = A^ U A\ where A^ and A\ are J^q ^^ measurable 
sets and such that \imnfj,{A'l) — and A^^ satisfies the conditions of Example^ 
above. Then, we have 

^TAr, <n)< mI^ao < + Kta^ <n)< ) + f^ir^i < n) — > 0, 

therefore the hypothesis of Theorem [7] is satisfied. 

We emphasize that, in this example, the exponential growth of the number of n- 
cylinders inside An is not a priori bounded by the entropy of the measure, contrary 
to the preceding example. 

Proposition 6. Suppose that (S,T, /^) is an ergodic measure preserving system, 
not necessarily a-mixing. Let be a sequence of integers such that 

limsup — logKn < hn(T). 
n n 

Then there exists a sequence e„ — >■ such that, for any An G J'o ^^ which is the 
union of at most Kn cylinders of rank n we have 

Ai(TA„ <n)< en- 
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We emphasize that the bound e„ does not depend on the particular set An 
but only on the number of cylinders which compose it. Note that the statement 
fJ.(TA„ < n) — is trivial whenever /x(^„) <C 1/n. However, even for a-mixing 
systems, there can exist some cylinders An of rank n such that /i(A„) 3> 1/n 
(See E). 

When the system is a-mixing, the measure preserving transformation (T, fi) is 
an exact endomorphism and in particular, its entropy h^j,{T) is positive (we refer 
to [3] for details). In particular Proposition [S] applies under the mixing hypotheses 
of Theorem [TJ 

Proof. Set /lo := lim sup„ ^ log k„ and let h e {ho,hf_i{T)) and k £ N such that 
/lo < (1 - l/k)h. Let 

r(iV) ^{x:yn>N, fi{[xo . . . x„_i]) < e""''}. 

By the Shannon-McMillan-Breiman theorem fi(r{N)) — > 1 as — > oo. Given an 
integer n, let m — \Ti/k~\ be the smallest integer such that km > n. First, observe 
that by invariance we have 

(2) /^(ta„ < < k^{TA„ < m). 

m — 1 

Let [/„ = IJ T^An. We have {rA„ < m} c r-™C/„, hence 
i=o 

(3) K'^A^ <m) < n{Un)- 

Moreover, since each An is contained in an union of at most cylinders of rank 
n — j, the set C/„ is contained in at most mK„ cylinders of rank n — m therefore 

/i([/„ n T{n - m)) < mK„e"(""™'''. 

On the other hand, 

/i([/„ \ r(n — to)) < 1 — M(r(n ~ m)). 

Setting e„ equal to fc times the sum of the last two upper bounds proves the propo- 
sition in view of ^ and ([3]). □ 

4. Proof of the main theorem 

Our main theorem will be a direct application of this explicit estimation of the 
difference between the hitting time statistics and the exponential distribution. 

Theorem 7. Suppose that the system (I],r, /i) is a-mixing. Let n be an integer. 
For any A G J^S^^ there exists some constant X{A) G (0, 2] such that 



sup niTA >k)- e-^(-4)^(^)'=^ < 12^/2il{^^^^^n)+'^. 

feGN 

The value of the upper bound is not intented to be optimal, but is just there to 
emphasize that it does not depend on the particular choice of the set A e J'o~^ 
but only on the probability of short hitting times /i(TA < n). 

In the proof of the theorem we make use of the following lemma. 

Lemma 8. Let n be an integer. For any A G J'q^^ such that 



6 := 3^/2fi{TA <n) + a{n) < 1/4, 
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there exist an integer s > 2n such that 

4 ^x{TA < s < and — — < 5. 

II(Ta < s — 2n) 

Proof. Let us define d — 2ijl{ta < n) + a{n). By hypothesis d < 1/144. By 
invariance we have 

^J■iTA < 2n) + a{n) < d. 
Let s > 2n denotes the smallest integer such that 

^J■{TA < s — 2n) > Vd. 

With this choice we have 

Ij{ta < 2n) + a{n) ^ ^ 
^J'(TA ^ s — 2n) ~ 

Furthermore, since /^(t^ < s — 2n — 1) < \/d, it follows from the invariance that 
m(ta < s) < ^i{TA < s - 271 - 1) + ^(ta < 2n + 1) < + 2d. 

□ 

Proof of Theorem^ Let n be an integer and A e J'o ^^- Let (5 be as in Lemma H) 
There is nothing to prove if J > 1/4 so we suppose that S < 1/4. Take s > 2n given 
by Lemma [S] such that ^ holds. 

To simplify notation we drop the subscript A and write r = ta- Set H{k) — 
fi{T > fc), and denote by rl*l — t oT^ the first occurrence time starting at time t. 
For any integer j > 1 consider the modulus 

(5) \H{js)-H{{j-l)s)H{s-2n)\. 
The sets 

{t > js} = {t > (j - l)s} n {r[(^-i)''l > s} 

and 

{r > (j - l)s} n {t[(J-i)''+2"1 > s - 2n} 

differ by a subset of {t^'^J^^'^^^ < 2n} whose measure is by invariance bounded by 
h(t < 2n). Furthermore, by mixing we get that 

|/i({T > {j - l)s; r[(J-i)^+2"l > s - 2n}) ~ H{{j - l)s)H{s - 2n)\ < a(n). 

Thus the above expression ^ is bounded by 

/i(r < 2n) + a{n) 

Now, take q a positive integer. The absolute value 

(6) \H{qs)- H{s-2n)'^\ 
is bounded by 

9 

J2 \HUs) - H{{j - l)s)H{s - 2n)\H{s ~ 2nf-K 

We just proved that the modulus in the above sum is bounded by ^(r < 2n) + a{n). 
Summing over j we get that for all integer A; > 1 the modulus in ^ is bounded by 
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Moreover, any non- negative integer k can be written as qs + r with q = [k/s] 
and < r < s. Then 

(7) \H{k) - H{qs)\ = fi{T > qs; < r), 

which, by invariance, is bounded by /i(r < s) < S. 
To finish the proof, set 

\nH{s-2n) 
sn{A) 

and note that the Mean Value Theorem gives 

(8) \H{s~2n)^''/'^ ^H{s-2nf''\ <-\nH{s-2n). 

Note that 77(s-2n)''/'' = e^^^^^^^^^''. By convexity we have - In(l-M) < u/{l-5) 
whenever < u < S, therefore 

- In iJ(s - 2n) < :/i(r <s~ 2n) < —^fi{T < s) < 2S. 

1 — 1 — (3 

Putting together the three estimates for ([6]), Q and ([8]) gives the conclusion. Ob- 
serve in addition that X{A) < 1/(1 — 6) < 2 since /i(T < s) < sfi{A). □ 

Remark 9. The upper bound X{A) < 2 can be sharpened when S is small. In 
particular if dn as n oo we get limsupA(j4„) < 1. 

We conclude this section with the proof of the main theorem. In view of Theo- 
rem [TUl the statement for hitting times in the main theorem (Theorem [IJ and the 
one for return times are equivalent, hence it is sufficient to prove the first statement 
with F{t) = 1 ~ e~*, which will imply the second statement with G{s) = e"". 

Proof of Theorem\^ For any real t > 0, taking k = \ t/ n{An)\ in Theorem [7] gives 



|/i(A(A„)M(^n)^A,. >t)~e *| < 12v/2/i(rA„ <n)+ a{n) + 2^(A„), 

which proves the first statement. The uniform convergence in ([IJ implies that of 
this upper bound, since 

= 1^{ta^ = 1) < M(Tyi„ < n). 

The second statement follows from Theorem [101 The third statement follows from 
Remark H □ 

5. Hitting and returning: an adaptation of haydn-Lacroix-Vaienti 

THEOREM 

Haydn, Lacroix and Vaienti [8] have prove that the asymptotic distribution of 
hitting and return times ta„, rescaled by the measure /x(A„) are related by an 
integral equation. Their result does not apply to our setting because the asymptotic 
distribution does not exist in general, because the normalizing constant does not 
converge in general. 

We now give the generalization of their result adapted to our case, which deserves 
a new proof since the technique needs to be relatively different. Let 

FA{t) ^ ^l{\{A)^l{A)TA <t), 

Ga{s) - -^^^i{\{A)^Ji{A)TA > s\A). 
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Fa is the usual non-decrasing cumulative distribution function of the rescaled hit- 
ting time X{A)fj,{A)TA while Ga is a normalized non-increasing distribution function 
of the rescaled return time X{A)ii{A)ta- We recall that since Fa and Ga are mo- 
notonous, their convergence when — ?■ on a dense set or on all but countably 
many points are equivalent and we will simply say that they converge. 

Theorem 10. Suppose that the measure preserving system (T.,T,fi) is ergodic. 

Let An he a sequence of measurable sets such that /-i(v4„) — > 0. If Fa„ converges 
to F as n ^ oo then Ga„ converges to some function G, and the lim,its are related 
by the integral equation 

F{t)^F{Q+)+ f G{s)ds (Vt>0). 







In particular, if the solution G is continuous then the convergence is uniform on 
[s, -l-oo) for any s > 0. 

Reciprocally, if Ga„ converges to G as n ^ oo and G{s)ds = 1 then Fa„ 
converges to some function F , and the limits are related by the same integral relation 
with F{Q+) — 0. In particular, F is continuous on [0, cx)) and the convergence is 
uniform. 

Proof. Let A be any measurable set with > 0. Note that < FA{t) < 1 and 
< Ga{s) < 1/s for any s > 0, where this last upper bound follows from Markov 
inequality and Kac's Lemma: 

GAis) = j^^fi{X{A)f^{A)rA > s\A) <^J rAdn{-\A} < i. 

First observe that by invariance one has for every integer n 

^{ta = n) = n{A n {ta > n}). 

Therefore 

t/A(A)M(A) 

FA{t)= V ^^{An{TA>n}) 



n=l 



Lt/A(A)M(A)J 

^i{A n {ta > r})dr. 







Since fi{A n {ta > r}) < we get by a change of variable 



FAit) < / GA{s)ds < FA{t) + ^i[A). 





For any < t < we get the relation 

(9) / GA{s)ds- ii{A)<FA{t')-FA{t)< f GA{s)ds + ^i{A). 

• Assume that Fa„ converges to some function F and suppose for a contradiction 
that does not converge. By Kelly's selection principle, each subsequence of 
function must have an accumulation pointQ. Therefore Ga„ must have at least two 



^Indeed, the space of decreasing functions g from (0, oo) to itself such that g{s) < s, under the 
equivalence relation of equality outside countable sets, is metrizable (e.g. a slight modification of 
the Levy metric) and compact (Helly selection principle) and an accumulation point refers to this 
notion of convergence. 
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different accumulation points Gi and G2 ■ By dominated convergence ([9]) gives that 
for all < t < 

(10) F{t')-F{t)^ f G,is)ds (i = l,2) 



Hence Gi — G2 a.e., a contradiction; thus Ga„ converges. Lastly, the integral 
relation follows from (jlOp by monotone convergence. 

• Assume that Ga^ converges to some function G. By Fatou's lemma, the left- 
most inequality in ^ gives that for alH > 

/ G{s)ds <\im mi Fa {t) ; / G(s)ds < liminf (1 - i^^ (t)) . 
Jo " Jt 

therefore under our assumption on the limit G, Fa„ converges to F and 

F{t) = / G{s)ds. 

□ 
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